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ABSTRACT 

In the coming decade, low-frequency radio arrays will begin to probe the epoch of 
reionization via the redshifted 21-cm hydrogen line. Successful interpretation of these 
observations will require effective statistical techniques for analyzing the data. Due to 
the difficulty of these measurements, it is important to develop techniques beyond the 
standard power spectrum analysis in order to offer independent confirmation of the 
reionization history, probe different aspects of the topology of reionization, and have 
different systematic errors. In order to assess the promise of probability distribution 
functions (PDFs) as statistical analysis tools in 21-cm cosmology, we first measure the 
21-cm brightness temperature (one-point) PDFs in six different reionization simula- 
tions. We then parametrize their most distinct features by fitting them to a simple 
model. Using the same simulations, we also present the first measurements of differ- 
ence PDFs in simulations of reionization. We find that while these statistics probe 
the properties of the ionizing sources, they are relatively independent of small-scale, 
sub-grid astrophysics. We discuss the additional information that the difference PDF 
can provide on top of the power spectrum and the one-point PDF. 
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1 INTRODUCTION: COSMIC REIONIZATION 

AND 21-CM COSMOLOGY 

•i-H . 

^^ , The first stars and quasars reionized the intergalactic 

5_^ ■ medium (IGM) during the epoch of reionization (EOR) 

Cu ' IjBarkana fc Loebl 120011 ). From the Lya absorption in the 

spectra of dista nt quasars, we know th at the EOR ended at 

2 ~ 6 (see, e.g., iDiorgovski et alj2006l ). and from the cosmic 
microwave background (CMB) polariz ation maps we can in - 
fer that it started no later than z ~ 15 (jDunklev et al.ll2009l ). 
Understanding how the reionization process developed over 
time offers a way to answer some of the critical questions of 
modern cosmology concerning properties of the first sources 
of light in the universe. Among the most promising obser- 
vational probes of the EOR is the 21-cm spectral line, from 
hyperfine splitting in the ground state of hydrogen, with an 
energy of 5.9 x 10 -6 eV that corresponds to the rest-frame 
frequency of 1420 MHz. The redshifted 21-cm emission from 
neutral regions of the IGM during reionization is estimated 
to be a 1% correction to the energy density of the CMB. It is 
expected to display angular structure and frequency struc- 
ture, due to the inhomogeneities in the gas density, ionized 



fraction, s:Jj, and spin temperature (jMadau et al.lll997f ) of 
the emitting gas. 

Statistical detection of the large-scale brightness fluctu- 
ations is within the scope of a number of experiments that 
are presently being built, such as the Murchison Widefield 
Array (MWA) an d the Low Frequency A r ray (LOFAR) (for 
revie ws see, e.g., iFurlanetto et all 120061 : iBarkana fc Loebl 
12007V ). In this context, it is important to develop appropriate 
statistical tools to be employed in analyzing the incoming 
data. Such development is facilitated by the fact that the N- 
body and radiative-transfer simulations of reionization have 
begun to reach the large scales of order 100 comoving Mpc 
JMcQuinn et all l2007l : llliev et al.ll2008l : ISantos et all l2008h 
needed to capture the evolution of the IGM during the EOR. 
These simulated data cubes can be used to test various sta- 
tistical tools proposed for extracting information about the 
properties of the IGM during reionization. 

So far, studies of the statistics of the 21-cm fluctuations 
have mainly focused on the power spectrum of the bright- 
ness temperature, T b (jBowman. Morales, fc Hewittl 120061 : 
iMcQuinn et al.ll2006T ). While this statistic is fully represen- 
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The ionized fraction xi is the ratio of the number of protons to 
the total number of hydrogen atoms. 
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tative at the onset of the EOR, where the Gaussian pri- 
mordial density fluctuations drive the 21-cm fluctuations, 
it ceases to be so at later times. Namely, as the reioniza- 
tion process advances, the mapping between the hydrogen 
density and T b becomes highly non-linear (as evidenced for 
instance by the bounded domain, Xi £ [0, 1]), which results 
in non-Gaussianity of the probability distribution function 
(PDF) of T b . For this reason, various authors have started 
exploring alternative and complementary st atistics (e.g., 
iFurlanetto et all |200J ; ISaivad-Ali et al.ll200rt) . in particu- 
lar the PDFs and difference PDFs of the 21-cm brightnes s 
temperature (jBarkana fc Loebll2008l ; llchikawa et al.ll2009f) . 



We test th ese two statistic s on si x different simulated data 
cubes from iMcQuinn et al.l (|2007l ). These data cubes are re- 
sults of different astrophysical inputs that produce various 
reionization histories, all of which are allowed by the current 
observational constraints. We measure the one-point PDFs 
and difference PDFs and analyze their properties. 

The plan of the paper is as follows: in Section 12.11 
we briefly describe the simulation runs used in this pa- 
per. In Section 12.21 we then present the measured one- 
po int PDFs along with the best fits of the model proposed 
by llchikawa et al.l 1)20091 ). and discuss the main parameters 
driving the PDF shape. We next present in Section \2. 31 the 
first measurements of difference PDFs for the same set of 
simulations and analyze their properties. We conclude in 
Section [3] 



2 STATISTICS OF 21-CM FLUCTUATIONS 

2.1 Simulations 

In order to interpret future observations of the high-redshift 
universe, we need to understand the morphology of HII re- 
gions during reionization, in particular their size distribu- 
tion and how it is affected by the properties of the ionizing 
sources, gas clumping and source s uppression from photo- 
heating feedback. For this purpose, IMcQuinn et alj (|2007l ) 
ran a 1024 3 N-body simulation in a box of size 65.6h~ 1 ~ 94 
Mpc to model the density field, post-processing it using 
a suite of radiative transfer simulations. The authors as- 
sumed a standard ACDM cosmology, with n 3 — 1, (T& — 0.9, 
tt m = 0.3, «a = 0.7, n b = 0.04 and h = 0.7. The out- 
puts are stored at 50 million year intervals, roughly between 
redshifts 6 and 16. 

The radiative transfer code assumes sharp HII fronts, 
which are traced at subgrid scales. The properties of the 
sources are chosen in most cases so that reionization ends 
near z ~ 7. A soft ultraviolet spectrum that scales as v^ is 
assumed for each source. The typical luminosity of a halo of 
mass m is taken to be N(m) = 3 x 10 49 m/(10 8 Mo) ionizing 
photons per second. This corresponds to a halo star forma- 
tion rate of S(m) - /"* m/(10 10 M o ) in units of Af©yr _1 , 
for an escape fraction of / osc and a Salpeter IMF. The N- 
body simulation resolves haloes down to 1O 9 M0, but since 
the effect of smaller mass haloes cannot be neglected, the 
effect of haloes down to 10 8 Mq is included in some of the 
runs with a merger tree (see Table [TJ. 

For the purpose of measuring PDFs and difference 
PDFs , we choose six runs, labeled as in IMcQuinn et al.l 
(|2007l l: SI, S4, C5, F2, M2 and Zl. These runs differ by 



the efficiency of the sources and by the halo-mass resolu- 
tion. Some runs include feedback from photo-heating, which 
suppresses source formation within ionized regions. Others 
investigate the impact of clumping, i.e., IGM density in- 
homogeneities, and include a subgrid clumping factor C C oii 
different from unity. Finally, some runs account for the pres- 
ence of minihaloes, which are dense absorbers for ionizing 
photons and thus tend to extend the process of reioniza- 
tion. A summary of the parameters of each of the six runs is 
presented in Table [l] The list of the redshift slices for each 
data cube is sh own in Table [2] For m ore details about the 
simulations, see IMcQuinn et al.1 (|2007T l . 

2.2 One-point PDFs 

We measure the one-point PDFs of the observed brightness 
temperature T b of the redshifted 21-cm emis sion, in the six 
simulation runs from IMcQuinn et al.l (|2007r i . We measure 
T b in a 32 3 grid, i.e., T b is averaged over cells of size of 2.9 
comoving Mpc, at each of the redshifts listed in Table [2] 
Note that this scale is much larger than the simulation's 
spatial resolution; it is chosen by balancing the requirement 
to be large enough to fall close to the general range of the 
upcoming observations (corresponding to a resolution of ar- 
cminutes), with the need to be small enough compared to 
the simulation box to give a reasonable statistical sample. 
The PDFs are shown in Figure [l] 

As seen in Figure [T] the one-point PDFs are Gaussian- 
like at the highest redshifts and highly non-Gaussian at 
lower redshifts. The Gaussian shape of the PDFs at the 
beginning of reionization, when the universe is almost com- 
pletely neutral, is driven by the primordial fluctuations in 
the density field of the emitting IGM gas. At lower redshifts 
and near the end of reionization, the completely ionized gas 
does not emit at 21-cm, while the brightness temperatures 
of the leftover patches of neutral gas are still governed by 
the density field inhomogeneities. At these redshifts, entirely 
ionized cells contribute to the increasingly-dominant delta- 
function at Tt = mK, while the emission from the partially 
neutral cells maintains a Gaussian around a higher T b . The 
interplay between these two types of cells sets the shape of 
the PDFs as the reionization proceeds. 

In Figure [T] we also plot the best fit of a Gaussian + 
Exponential + (Dirac) Delta function model (GED model) 
for the 1-pt PDFs. This is an "em pirical" model ( i .e., ba sed 
on simulation data), suggested by llchikawa et al.l (|2009r ) : 
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To get a smooth curve, the values of the two functions and 
their first derivatives are matched at the brightness temper- 
ature T b — Tl, leaving (after normalization) four indepen- 
dent parameters for the GED model: the joining point of the 
exponential and the Gaussian function, Tl, the mean of the 
Gaussian, Tg, its standard deviation, og, and its maximum 
value cg- 

From the best fit of the GED model to the PDF in each 
redshift slice for each of the simulated data cubes, we obtain 
the value of the probability fraction contained in the delta- 
function (at T b s around zero), Pd, in the Gaussian (at high 
T b s), Pg, and in the exponential part of the model (which 
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Simulation Merger tree haloes N [photons sec J ] 



Comments 



51 


Yes 


2 x 10 4a M 8 


51 


No 


C S4 M g 


cs 


No 


6 x 10 49 M 8 


F2 


Yes 


2 x 10 49 M 8 


M2 


No 


9 x 10 49 M 8 


Z\ 


Yes 


1 x 10 50 M 8 



Includes only haloes with m>4x 10 10 Mq 

Structure on small scales; C ce n = 4 + 3<5c 

Includes feedback on m < Mj/2; Tgj? = 20Myr 

Includes minihaloes with m m ; n ; > 10 5 Mq 

Higher source efficiency (early reionization) 



Table 1. Details on the radiative transfer simulations from iMcQuinn et alj (J2007l ). Merger tree haloes: 'Yes' means that the halo 
resolution is supplemented with a merger tree down to 10 8 Mq. Cg4 is calibrated such that there is the same output of ionizing photons 
in each time-step as in SI. M 8 denotes the halo mass in units of 10 8 Mq. C cc n is the subgrid clumping factor, and 5c is the baryonic 
overdensity smoothed on the cell scale, rgp is the time-scale over which the cool gas in the source is converted into stars, and Mj is the 
linear-theory Jeans mass. 



Simulation 



Redshift slices 



51 6.9, 7.3, 7.7, 8.2, 8.7, 9.4, 10.1, 11.1, 12.3, 13.9, 16.2 
54 7.3, 7.7, 8.2, 8.7, 9.4, 10.1, 11.1, 12.3, 13.9, 16.2 
C5 6.6, 6.9, 7.3, 7.7, 8.2, 8.7, 9.4, 10.1, 11.1, 12.3, 13.9, 16.2 
F2 6.3, 6.6, 6.9, 7.3, 7.7, 8.2, 8.7, 9.4, 10.1, 11.1, 12.3, 13.9, 16.2 
M2 6.9, 7.3, 7.7, 8.2, 8.7, 9.4, 10.1, 11.1, 12.3, 13.9, 16.2 
Z\ 10.1,11.1,12.3,13.9,16.2 

Table 2. List of the simulation runs and corresponding redshift slices discussed in this paper. The redshift outputs for each simulation 
are spaced in 50 Myr intervals. 



interpolates between the delta function and the Gaussian), 
Pe- These parameters can be reconstructed from observa- 
tions and must sum to unity, to ensure a normalized PDF. 
The variation of Pd , Pe , Pg , Tg , Tl , and <jg with the global 
ionized fraction, Xi, is shown in Figure [2] The evolution of Si 
with redshift for each of the simulation runs is also shown. 

When we fit the GED model, while its 5d function por- 
tion is meant to capture the PDF spike near Ti, = mK 
(at low redshifts), we do not attempt to model (or resolve 
in the data) the shape of this spike. Thus, we exclude the 
lowest temperature bin before fitting the GED model, and 
derive Pd in three different ways: from the required overall 
normalization of the PDF to unity; then also directly, i.e., 
without the model fitting, from the total number count in 
the first bin of the Ti, PDF; finally, we measure the one-point 
PDFs of Xi, and estimate Pd from the number count in the 
highest bin of this PDF (the cells in this bin have xi ^ 0.95). 
These three different estimates of Pd are compared for all 
six simulation runs in Figure[3] In this Figure, we show that 
the difference in Pd calculated from the cell counts and from 
the GED model is negligible, which indicates that this model 
represents the data faithfully. The values of Pd as measured 
indirectly (from fitting the GED model) yield an accurate 
estimate of the fraction of fully ionized cells. In the limit of 
infinite resolution, Pd would equal Xi and so directly mea- 
sure the reionization history, while in reality Pd measures 



a low-resolution, smoothed-out version of the reionization 
history. 

Comparing the various simulation runs, we find that 
small-scale structure has a relatively minor effect on the 
21-cm PDF during reionization, at least for the present 
implementation of sub-grid astrophysics, and when the fi- 
nal reionization redshift is held (relatively) fixed. Compared 
to our fiducial case (SI), we have three simulations where 
mainly the small-scale structure has been adjusted: a sce- 
nario with photoheating feedback (F2), one with evaporat- 
ing minihaloes (M2), and one with increased sub-grid clump- 
ing (C5). The latter two also do not have merger-tree source 
haloes. Figures [T] and [5] show that these scenarios have the 
effect of stretching out cosmic reionization, especially by de- 
laying its progress early on, when the rarity of high-mass 
haloes makes feedback effective (F2), or the still-high density 
makes recombinations important (C5), or the minihaloes 
have not yet photo-evaporated (M2). However, in all these 
scenarios, the 21-cm PDF at a given stage (as measured by 
Xi) is fairly unchanged. In particular, the evolution of the 
probabilities Pd, Pe, Pg, and the parameters Tg and Tl is 
rather similar for these three scenarios and SI. 

Strong changes in the properties of the ionizing sources 
do have more of an effect on the evolution of the PDF. For 
example, a higher source luminosity (Zl) leads to earlier 
reionization by somewhat more massive haloes. Even early 
in reionization, the ionized bubbles are already rather large 
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Figure 1. One-point PDFs for six different simulation runs, at 
the redshifts listed in Table[2] The y-axis is the logarithm of p(T(,), 
where p represents the number fraction of points (i.e., pixels, or 
data-cube cells) at a given Tj, normalized by the temperature- 
bin size. The units of p are mK" 1 . There are 20 evenly spaced 
temperature bins in each curve. The size of a bin at each redshift 
is set by the temperature range of the cells at that redshift. The 
top PDF in each plot is at the lowest redshift of the data cube, 
and the redshift increases downward. The i-th PDF in each plot 
is shifted down the vertical axis by 2(j — 1) in logarithmic space, 
for clarity. We show the measured PDFs (solid red curves) as well 
as the best fits of the GED model (dashed blue curves). 



(compared to the fixed pixel scale at which the PDF is mea- 
sured), which changes the PDF shape and the reconstructed 
GED-model parameters. Also, since reionization in this case 
occurs at higher redshifts, when the universe is denser, the 
mean 21-cm brightness is higher, leading to higher values of 
Tq and Tl compared to the lower-redshift cases. Further- 
more, even without changing the redshift range, if the ion- 
ized sources lie in much more massive haloes (S4) than in the 
SI case, the impact on the PDFs is noticeable. In this case, 
the ionized bubbles (produced by larger and more strongly- 
clustered sources) grow larger than the effective PDF reso- 
lution quite early in reionization, so that Pd is larger than 
in the other cases (mostly at the expense of Pe), and is 
generally closer to the value of Xi. 

In summary, we find that Xi is the main parameter de- 
termining the one-point 21-cm PDFs. In particular, various 
modifications in the small-scale structure have only a mi- 
nor effect on the PDF evolution versus Xi (as quantified by 
the parameters of the GED model fits). This suggests that 
analysis of features of the observed one-point PDFs can be 
used to reconstruct the global reionization history relatively 
independently of any assumptions about the astrophysics on 
unresolved sub-grid scales. On the other hand, the typical 
mass of source haloes, and the typical reionization redshift, 
have more of an effect. It is important to note, though, that 
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Figure 2. Top panel: evolution of the global ionized fraction Xi 
with redshift z, for each of the six simulation runs. Other pan- 
els show how various parameters derived from the best-fit GED 
model to the 21-cm PDF evolve as the reionization proceeds. 
Pdj Pe-, and Pg are, respectively, the fractional probability in 
the delta-function, the exponential, and the Gaussian part of the 
PDF; Tl is the joining point of the exponential and the Gaussian, 
Tq is the mean of the Gaussian, and ctq is its variance. There are 
four free parameters in the fit to each PDF. 



observations will provide independent constraints on these 
major parameters. The redshift will obviously be measured, 
and, for example, the span of the reionization epoch will 
constrain the typical halo mass driving this process. Note 
also from Figure [2] that in all the simulation runs, the Gaus- 
sian probability Pg can be taken as a rough estimate of the 
cosmic neutral fraction, 1 — Xi . 

While the one-point PDF is interesting, it will be rather 
difficult to measure with the upcoming generation of instru- 
ments, mainly due to comparatively bright f oregrounds and 
the as sociated thermal noise. In particular, llchikawa et al.l 
l|2009f ) found that the one-point PDF can only be recon- 
structed from upcoming observations if the analysis is made 
on the basis of quite strict (and not easily tested) assump- 
tions that the real PDF is very similar in shape to that mea- 
sured in numerical simulations. This difficulty motivates the 
use of an alternative statistical tool that should have a much 
higher signal-to-noise ratio for a g iven observation, namel y 
the difference PDF proposed by iBarkana fc Loebl IJ2008T) . 
In the next Section we present the first numerical measure- 
ments of difference PDFs, specifically using the same six 
simulated data cubes, and we discuss their properties. 



2.3 Difference PDFs 

The PDF of the difference in the 21-cm brightness temper- 
atures, ATI = |T(,i — 21,2 1, at two separate points in the sky 
(or, analogously, at two cells in the sim ulated data cube) 
was suggested bv lBarkana fc Loebl (|200cf ) as a useful statis- 
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Figure 4. Difference PDFs arc shown for all the output redshifts of the 51 simulation. The x-axis is the brightness temperature difference 
AT;, for pairs of cells, and the y-axis, p&, is the number fraction of pairs of cells at a given separation, r, with a given AT}, (normalized 
by the size of the AT b bin). Different curves indicate different r bins (the legend indicates the central values of the logarithmic bins). 



tic for describing the tomography of the IGM during the 
EOR. More precisely, if we consider two points separated by 
a distance r, then the distribution of ATj, is given by the 
difference PDF pa(AT 6 ), normalized as J p A AT b = 1. The 
motivation for introducing this statistic is at least threefold. 
Firstly, the effective number of data points available for re- 
constructing the difference PDF is much larger than for the 
one-point PDF (by roughly a factor of the number of pixels 
over two, though the pixel pairs must be divided up into bins 
of distance r). Secondly, the difference PDF (which is a ma- 



jor piece of the two-point PDF) is a generalization which not 
only includes the information in the commonly considered 
power spectrum or two-point correlation function (which can 
be derived from the variance of the difference PDF), but also 
yields additional information. And thirdly, being a PDF of 
a difference in TJ,, it avoids the mean sky background and 
fits naturally with the temperature differences measured via 
interferometry. 

We present the first measurements of difference PDFs, 
for the same set of iMcQuinn et al.l (|2007l ) simulation runs 
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Figure 3. Three different measurements of Pjj are shown versus 
xi . These include Pjj as calculated from the best fit of the GED 
model (solid black curve), directly from the measured PDF of T b , 
i.e., the fraction of pixels that fall within the lowest-temperature 
bin (dotted blue curve), and directly from the measured PDF of 
Xi, i.e., the fraction of pixels that fall within the highest- Xi bin 
(dashed red curve). Results for the six different simulation runs 
are shown. 



that we used to discuss the properties of the one-point PDF 
in the previous Section. The difference PDFs for all the red- 
shifts of 51 are shown in Figure [4] For each of the other 
five simulation runs, we show difference PDFs at three rep- 
resentative redshifts in Figure [5] In Figures [4] and O every 
redshift has 6 distance bins, logarithmically spaced, and each 
distance bin has 20 temperature bins, linearly spaced. The 
range in distance is chosen so that it covers basically the 
full range from the resolution (pixel) scale to the largest 
separations found within the 94 comoving Mpc data cube. 

Even for the one-point PDF, there is no good analyt- 
ical model that matches simulations, probably because the 
PDF is sensitive to the reionization topology, specifically to 
the way in which the complex-shaped ionize d bubbles par- 
tially overlap the box-shaped pixels. This led llchikawa et al.l 
(|2009l ) to base their analysis on the PDF as measured in a 
simulation, and to consider an empirical model for fitting the 
PDF shape. Similar ly, in the case of the diff erence PDF, the 
analytical model of lBarkana fc Loebl (|2008l ) does not quan- 
titatively match the result that we find in the simulations, 
b ut we can noneth e less u se the model and the discussion 
in I Barkana fc Loebl 1)20081 ) to develop a qualitative under- 
standing of the difference PDF and how best to analyze it. 

At high redshifts, when the PDF is nearly Gaussian, the 
difference PDF (which is defined using the absolute value of 
AT},) should approximately be a half-Gaussian. Non-linear 



density fluctuations, though, give a slower decaying tail at 
high AT;, than would be expected for a pure Gaussian. As 
reionization develops with time, the difference PDF becomes 
a superposition of three contributing terms: The pixel pairs 
in which both pixels are fully ionized, those in which one 
pixel is (partly) neutral and the other ionized, and those 
where both are neutral. We explicitly show these three con- 
tributions in Figure [6] for one redshift near the midpoint 
of reionization in the ST simulation. The both-ionized pixel 
pairs basically give the Sd function at zero AT},; the amount 
of probability in this Sd function is physically meaningful, 
as discussed below. The pairs in which one pixel is neutral 
and the other ionized tend to be well separated in TJ,, and 
so this contribution is responsible for the high ATI, peak, 
at ATI, ~ 13 mK in the case plotted here. As we consider 
smaller r, the T;, values of the two points that are separated 
by r become more strongly correlated, making it difficult for 
one to be ionized and the other neutral, and so this contri- 
bution declines at smaller r. At the same time, as we re- 
duce r, this contribution becomes more highly concentrated 
at small values of AT;,, since at small separations, if one 
pixel is fully ionized, then the second one tends to be at 
least highly ionized, making their Tb difference small. Note 
also that the ionized+neutral contribution drops suddenly 
at ATI) ~ 1.5 mK, but this is due to the fact that some of the 
probability in this region that should be included under ion- 
ized+neutral is incorrectly swept up under the both-ionized 
Sd function, which is dominant at ATI, near zero. This is an 
unavoidable effect of the finite (2.9 Mpc) resolution of our 
gridded field. Since in practice we define a "fully ionized" 
pixel as having an ionization fraction of 95% or higher, some 
highly ionized pixels are classified as fully ionized; a higher 
resolution map, with a definition closer to 100% ionization, 
would move this artificial drop-off closer to ATt = 0. Those 
pairs in which both pixels are neutral peak at ATI, = 0, and 
give a contribution with a roughly half-Gaussian shape, at 
all separations r. 

The difference PDF as a function of separation transi- 
tions between two limits. The r — ¥ oo limit corresponds to 
two uncorrelated points, for which j>a is essentially a con- 
volution of the one-point PDF p with itself. As long as r is 
large enough to maintain a weak correlation, pa keeps its 
large-r limit and only varies slowly as r is decreased. Once r 
becomes small enough for a significant correlation (which is 
positive in the physical regimes considered here) , it becomes 
harder to produce a large difference AT;, between the two 
correlated points (as noted above, even when one is fully ion- 
ized and the other is not); pa thus becomes more strongly 
concentrated near ATb — 0, approaching a Sd function at 
ATt = as r -> 0. 

The correlation between cells, referred to above, probes 
different physical effects at different redshifts. Before reion- 
ization, this is the density correlation, which arises from 
large-scale modes in the initial fluctuations. During reion- 
ization, the correlation of Tb is dominated by ionization, so 
that points close enough together to be in the same ionized 
bubble (or in strongly correlated nearby bubbles) will have 
strongly correlated 21-cm brightness temperatures. Thus, by 
inspecting the plots, one can make a rough estimate of the 
average size of an ionized bubble at low redshifts, or the typ- 
ical density fluctuation correlation length at high redshifts. 
This effective correlation length is the first separation bin 
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Figure 6. Separate contributions to the difference PDF, for the 
SI simulation at z = 7.3. We show the contribution of pixel pairs 
in which both cells are fully ionized (dashed curves, capturing the 
<5 rj> functions at zero AT}, ) , pairs in which one cell is fully ionized 
(dot-dashed curves), and pairs in which neither of the cells is fully 
ionized (dotted curves). The total pa (solid curves) is the sum of 
these three contributions. 
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Figure 5. This Figure is a smaller version of Figure|4] difference 
PDFs for the rest of the simulation runs are shown for three 
representative redshifts (i.e., early, mid, and late reionization). 
The legend is the same as in Figure [4] 



at which the difference PDF at high ATI, drops significantly 
below its value at larger separations. For example, in Fig- 
ure|4j the average bubble size can be seen to increase beyond 
10 comoving Mpc during the late stages of reionization. 

As in the case of the one-point PDF, the difference 
PDF is relatively insensitive to variations in the small-scale 
sub-grid physics, as tested by the various simulation runs. 
Figure [7] displays a comparison of the difference PDFs for 
the six different reionization runs, for various comoving dis- 
tance bins, at the redshift where Xi is closest to the value of 
0.4 (i.e., in the midst of the reionization process). We also 
show the corresponding one-point PDFs, for easy compari- 
son. Similarly to p (as discussed in the previous section), the 
large ionized bubbles and correlation length in the case of 



reionization by massive, rare sources stretch pa out to higher 
values of AT}, (as seen in the Zl run a nd, especially, S4). Ou r 
findings are consistent with those of lMcQuinn et al.l ( 20071 ) 
and with the general theoretical expectation that clustered 
groups of gala xies determine the spat ial distribution of ion- 
ized bubbles (JBarkana fc Loebl 120041 ) which is then driven 
by large-scale modes and is mainly sensitive to the overall 
bias of the ionizing sources. 

We next proceed to measure the parameter analogous 
to Pd, but this time for the case of difference PDFs. This 
parameter, which we denote APd, represents the (number) 
fraction of pairs for which ATi, w 0. In the reality of hav- 
ing limited resolution, it is the fraction of pairs for which 
ATi, falls within the lowest temperature bin. Ideally, this 
value would directly measure the fraction of pairs in which 
both cells are fully ionized, but the finite resolution adds a 
contribution from pairs that do not satisfy this condition, 
but nonetheless have matching brightness temperatures to 
within the bin size. 

Just as Pd in the one-point PDF measures a low- 
resolution version of the reionization history, so can APd 
be considered as measuring a l ow-resolution version o f the 
ionization correlation function (|Barkana fc Loebl 120081 ) . In 
particular, in the limit of infinitely high resolution, APd 
would precisely measure the joint ionization probability of 
two points as a function of their separation. In this case, we 
would expect APd to vary from the corresponding value of 
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difference PDFs (lower panel), for all six simulation runs, at r = 
16.2 comoving Mpc. Each simulation run is shown at the rcdshift 
for which the value of the global ionized fraction xi is closest to 
0.4, i.e. in the midst of the reionization process. 



Pd, at r — >■ 0, down to Pp, at r — > oo (where each pixel 
in the pair is independently ionized with probability Pd)- 
While these relations are not exact with finite resolution, 
they do provide a rough guide for what to expect. We show 
the value of APd in Figure |8j for the redshifts of each sim- 
ulation at which the Sd function at ATb ~ is visible. A 
comparison with the corresponding values of Pd (shown in 
Figure [3]) shows that the above theoretical behavior is sat- 
isfied only approximately, since the fully ionized pairs make 
up only a fraction of APd- Still, the Figure shows the flat 
asymptote of APd at large r, and its rise as r drops below 
the correlation length (although r does not quite reach small 
enough values to see the flat r — > asymptote). 

We have illustrated how the difference PDF encodes in- 
formation about the EOR, in particular by separating out 
information on the ionization correlations (unlike the power 
spectrum analysis, in which the ionization and density corre- 
lations are mixed together). We leave for future work a more 
quantitative analysis of the features of the difference PDF 
and their relation to the properties of the ionizing sources 
and the reionization history. 



3 SUMMARY AND CONCLUSIONS 

Upcoming low-frequency radio observations will use the 
MWA, LOFAR, and similar instruments to survey the sky 
for redshifted 21-cm emission. It is therefore important to 
develop statistical tools that can be used to extract infor- 
mation about the history of reionization from these observa- 
tions. In this paper, we examine PD Fs and difference PDFs , 
using six different simulations from iMcQuinn et al.l (|2007l ). 
We show that the PDFs are highly non-Gaussian in the 
midst of the reionization process, and are thus a complemcn- 
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Figure 8. APd parameter, measured from difference PDFs as 
the value of the zero-temperaturc-difference bin, for the SI sim- 
ulation, is shown for the redshifts where it is non-vanishing. It 
represents the fractional probability in the delta function around 
AT;, Rt 0, i.e. the number fraction of pairs of cells which arc either 
both fully ionized, or both at the same brightness temperature. 
We see that this value tends to asymptote to the corresponding 
Pd, at r ~ 0, as expected for fully correlated cells. The thin lines 
of the same type and color represent the part of the corresponding 
APjj that originates from pairs where both cells are fully ionized. 



tary statistic to the commonly discussed power spectrum. As 
a way to analyze the PDFs, we examine the evolution of the 
parameters of the best-fit GED model with the global ion- 
ized fraction Xi. In particular, the 5d function portion of 
the probability (Pd) measures a low-resolution, smoothed- 
out version of the reionization history (i.e., x% as a function 
of redshift). 

We also present the first numerical measurements of dif- 
ference PDFs; specifically, we meas ure difference PDFs fo r 
the same set of simulation runs from lMcQuinn et al.l (|2007l) . 
We argue that the larger data set and the nature of this 
statistic can be significant advantages in the presence of 
bright foregrounds. The difference PDF can be physically 
understood as arising from three contributions: pixel pairs 
in which both, one, or neither of the pixels is ionized. As an 
illustration of the information that can be deduced from the 
difference PDFs, we consider the typical correlation length, 
which corresponds to the average size of an ionized bubble 
during reionization, or the typical density fluctuation corre- 
lation length, at the onset of the EOR. The difference PDF 
also has a delta- function portion (APd), which measures a 



low-resolution, smoothed-out version of the ionization cor- 
relation function at each redshift. 

We find that increasing small-scale clumping, and in- 
cluding photoheating feedback or minihaloes has only a 
small effect on the one-point and difference PDFs (consid- 
ered at a given Xi), at least within the range of assump- 
tions covered by the simulations that we considered. On the 
other hand, the PDFs are highly sensitive to the proper- 
ties of the ionizing sources, so that measuring them can 
help distinguish between reionization driven by large ver- 
sus small haloes and help us unveil information about the 
first sources o f light in the un i verse. These conclusions par- 
allel those of iMcQuinn et all (120071 ). highlighting the fun- 
damental fact that the spatial structure of reionization is 
driven by large-scale modes and d epends mainly on the 
overall bias of the ion izing sources (JBarkana fc Loebl 12004 ; 
iFurlanetto et alj|2004h . 

We plan to more precisely quantify the properties of 
difference PDF and establish the relation between their fea- 
tures and the properties of the IGM during the EOR. It 
would also be interesting to explore the PDFs and difference 
PDFs in alternative reionization scenarios, such as those 
dominated by X-ray sources or a decaying dark matter par- 
ticle. Contrasting different scenarios may lead to a fuller un- 
derstanding of the information content of the specific PDF 
shapes that we measured. 
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